<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="Docutils 0.5: http://docutils.sourceforge.net/" />
<title>Papyros: Pythonic parallel processing</title>
<style type="text/css">

/* nuxeo doc css */
/* $Id: nuxeo_doc.css 30369 2005-12-07 09:14:45Z madarche $ */

body {
  font: 90% 'Lucida Grande', Verdana, Geneva, Lucida, Arial, Helvetica, sans-serif;
  background: #ffffff;
  color: black;
  margin: 2em;
  padding: 2em;
}

a[href] {
  color: #436976;
  background-color: transparent;
}

a.toc-backref {
  text-decoration: none;
}

h1 a[href] {
  text-decoration: none;
  color: #fcb100;
  background-color: transparent;
}

a.strong {
  font-weight: bold;
}

img {
  margin: 0;
  border: 0;
}

p {
  margin: 0.5em 0 1em 0;
  line-height: 1.5em;
}
p a {
  text-decoration: underline;
}
p a:visited {
  color: purple;
  background-color: transparent;
}
p a:active {
  color: red;
  background-color: transparent;
}
a:hover {
  text-decoration: none;
}
p img {
  border: 0;
  margin: 0;
}

h1, h2, h3, h4, h5, h6 {
  color: #003a6b;
  background-color: transparent;
  font: 100% 'Lucida Grande', Verdana, Geneva, Lucida, Arial, Helvetica, sans-serif;
  margin: 0;
  padding-top: 0.5em;
}

h1 {
  font-size: 160%;
  margin-bottom: 0.5em;
  border-bottom: 1px solid #fcb100;
}
h2 {
  font-size: 140%;
  margin-bottom: 0.5em;
  border-bottom: 1px solid #aaa;
}
h3 {
  font-size: 130%;
  margin-bottom: 0.5em;
}
h4 {
  font-size: 110%;
  font-weight: bold;
}
h5 {
  font-size: 100%;
  font-weight: bold;
}
h6 {
  font-size: 80%;
  font-weight: bold;
}

ul a, ol a {
  text-decoration: underline;
}

dt {
  font-weight: bold;
}
dt a {
  text-decoration: none;
}

dd {
  line-height: 1.5em;
  margin-bottom: 1em;
}

legend {
  background: #ffffff;
  padding: 0.5em;
}

form {
  margin: 0;
}


dl.form {
  margin: 0;
  padding: 1em;
}

dl.form dt {
  width: 30%;
  float: left;
  margin: 0;
  padding: 0 0.5em 0.5em 0;
  text-align: right;
}

input {
  font: 100% 'Lucida Grande', Verdana, Geneva, Lucida, Arial, Helvetica, sans-serif;
  color: black;
  background-color: white;
  vertical-align: middle;
}

abbr, acronym, .explain {
  color: black;
  background-color: transparent;
}

q, blockquote {
}

code, pre {
  font-family: monospace;
  font-size: 1.2em;
  display: block;
  padding: 10px;
  border: 1px solid #838183;
  background-color: #eee;
  color: #000;
  overflow: auto;
  margin: 0.5em 1em;
}

tt.docutils {
  font-size: large;
}

</style>
</head>
<body>
<div class="document" id="papyros-pythonic-parallel-processing">
<h1 class="title">Papyros: Pythonic parallel processing</h1>

<p><strong>papyros</strong> is a small platform independent parallel processing package. It provides a simple uniform interface for executing tasks concurrently in multiple threads or processes, locally or on remote hosts. This is a quickstart guide to the package; for more details, consult the <a class="reference" href="docs/index.html">documentation</a>.</p>
<div class="contents topic" id="contents">
<p class="topic-title first">Contents</p>
<ul class="simple">
<li><a class="reference" href="#requirements" id="id7">Requirements</a></li>
<li><a class="reference" href="#installation" id="id8">Installation</a></li>
<li><a class="reference" href="#basic-usage" id="id9">Basic usage</a><ul>
<li><a class="reference" href="#create-the-task-type-s" id="id10">1. Create the task type(s)</a></li>
<li><a class="reference" href="#get-a-task-dispatcher" id="id11">2. Get a task dispatcher</a></li>
<li><a class="reference" href="#generate-the-tasks-and-execute-them" id="id12">3. Generate the tasks and execute them</a></li>
</ul>
</li>
<li><a class="reference" href="#advanced-usage" id="id13">Advanced usage</a><ul>
<li><a class="reference" href="#task-queues" id="id14">Task queues</a></li>
<li><a class="reference" href="#exceptions" id="id15">Exceptions</a></li>
<li><a class="reference" href="#distributed-groups" id="id16">Distributed groups</a></li>
<li><a class="reference" href="#size-bounds" id="id17">Size bounds</a></li>
</ul>
</li>
</ul>
</div>
<div class="section" id="requirements">
<h1><a class="toc-backref" href="#id7">Requirements</a></h1>
<ul class="simple">
<li><a class="reference" href="http://www.python.org/">Python</a> 2.4+</li>
<li><a class="reference" href="http://pyro.sourceforge.net/">Pyro</a> 3.7+. This is required only for distributed processing.</li>
</ul>
</div>
<div class="section" id="installation">
<h1><a class="toc-backref" href="#id8">Installation</a></h1>
<p>From this directory run with root priviledges (or specify a user-writable <tt class="docutils literal"><span class="pre">--prefix</span></tt> otherwise):</p>
<pre class="literal-block">
python setup.py install
</pre>
</div>
<div class="section" id="basic-usage">
<h1><a class="toc-backref" href="#id9">Basic usage</a></h1>
<p>The typical steps to use <tt class="docutils literal"><span class="pre">papyros</span></tt> are the following:</p>
<div class="section" id="create-the-task-type-s">
<h2><a class="toc-backref" href="#id10">1. Create the task type(s)</a></h2>
<p>A callable to be processed concurrently has to be wrapped as a <a class="reference" href="docs/papyros.task.Task-class.html">Task</a> instance. The simplest way to define a Task type is through the <a class="reference" href="docs/papyros.task.Task-class.html#subclass">Task.subclass</a> decorator factory:</p>
<pre class="literal-block">
import papyros

&#64;papyros.Task.subclass()
def factorize(n):
    '''Return the prime factors of an integer.'''
    # do the real work here
    return (n, factors)
</pre>
<p>Now <tt class="docutils literal"><span class="pre">factorize</span></tt> is a Task subclass. A specific task can be instantiated by <tt class="docutils literal"><span class="pre">factorize(12345)</span></tt>; note that this does not invoke the callable. Any number of positional and keyword arguments are allowed <a class="footnote-reference" href="#id4" id="id1">[1]</a>.</p>
<p>If you don't want to rebind the original callable's name, just call <tt class="docutils literal"><span class="pre">subclass()</span></tt> explicitly:</p>
<pre class="literal-block">
def factorize(n):
    '''Return the prime factors of an integer.'''
    # do the real work here
    return (n, factors)

FactorizeTask = papyros.Task.subclass()(factorize, 'FactorizeTask')
# factorize still refers to the function
</pre>
<p>Note that in this case the name of the subclass should be specified explicitly as the second argument to the factory.</p>
</div>
<div class="section" id="get-a-task-dispatcher">
<h2><a class="toc-backref" href="#id11">2. Get a task dispatcher</a></h2>
<p>A <a class="reference" href="docs/papyros.base.TaskDispatcher-class.html">TaskDispatcher</a> is responsible for dispatching tasks to workers. Currently three dispatcher types are provided:</p>
<ul>
<li><p class="first"><strong>MultiThreadedTaskDispatcher</strong>: Each worker is a separate thread.</p>
<pre class="literal-block">
# create a dispatcher with 4 worker threads
dispatcher = papyros.MultiThreadedTaskDispatcher(num_workers=4)
</pre>
</li>
<li><p class="first"><strong>PyroTaskDispatcher</strong>: Each worker is a separate (possibly remote) process. This dispatcher requires <a class="reference" href="http://pyro.sourceforge.net/">Pyro</a>.</p>
<p>To use a <tt class="docutils literal"><span class="pre">PyroTaskDispatcher</span></tt>, three steps have to be taken before. Each may be run on a different host of the same network.</p>
<ol class="loweralpha">
<li><p class="first">Start the Pyro name server:</p>
<pre class="literal-block">
pyro-ns
</pre>
</li>
<li><p class="first">Start the dispatcher:</p>
<pre class="literal-block">
start-papyros dispatcher [options]
</pre>
</li>
</ol>
<blockquote>
<p>Several optional switches can be specified at the command line but none is required. To see all the available options, run <tt class="docutils literal"><span class="pre">start-papyros</span> <span class="pre">dispatcher</span> <span class="pre">--help</span></tt>.</p>
</blockquote>
<ol class="loweralpha" start="3">
<li><p class="first">Start <tt class="docutils literal"><span class="pre">n</span></tt> worker processes:</p>
<pre class="literal-block">
start-papyros workers [-n NUM] [options]
</pre>
</li>
</ol>
<blockquote>
<p>If '-n' is not specified, it defaults to the number of CPUs of the current host.  To see all the available options, run <tt class="docutils literal"><span class="pre">start-papyros</span> <span class="pre">workers</span> <span class="pre">--help</span></tt>. Finally, repeat this step for all worker machines <a class="footnote-reference" href="#id5" id="id2">[2]</a>.</p>
</blockquote>
<p>Now the client can get a proxy to the dispatcher:</p>
<pre class="literal-block">
dispatcher = papyros.PyroTaskDispatcher.get_proxy()
</pre>
</li>
<li><p class="first"><strong>DummyTaskDispatcher</strong>: This is a non-concurrent version, mainly as a proof of concept and for sake of completeness:</p>
<pre class="literal-block">
dispatcher = papyros.DummyTaskDispatcher()
</pre>
</li>
</ul>
</div>
<div class="section" id="generate-the-tasks-and-execute-them">
<h2><a class="toc-backref" href="#id12">3. Generate the tasks and execute them</a></h2>
<p>The main interface to <tt class="docutils literal"><span class="pre">papyros</span></tt> is the <a class="reference" href="docs/papyros.base.TaskDispatcher-class.html#execute">execute</a> method. It accepts an iterable of tasks and returns an iterator over the results of these tasks. If the tasks are not expected to raise an exception or if exceptions are not to be handled, the method can be called simply as:</p>
<pre class="literal-block">
for n,factors in dispatcher.execute(factorize(i) for i in xrange(1, 1000)):
    print &quot;%d: %s&quot; % (n,factors)
</pre>
<p>If a task may raise an exception which should be handled without breaking the loop, call explicitly the <tt class="docutils literal"><span class="pre">next</span></tt> method of the iterator:</p>
<pre class="literal-block">
iter_results = dispatcher.execute(factorize(i) for i in xrange(1, 1000))
for i in xrange(1,1000):
    try:
        print &quot;%d: %s&quot; % iter_results.next()
    except Exception, ex:
        print &quot;Exception raised: %s&quot; % ex
</pre>
<p><tt class="docutils literal"><span class="pre">execute</span></tt> also accepts several optional arguments, including:</p>
<ul class="simple">
<li><tt class="docutils literal"><span class="pre">ordered</span></tt>: By default, the results are not necessarily yielded in the same order their respective task was submitted. If the order should be preserved, specify <tt class="docutils literal"><span class="pre">ordered=True</span></tt>.</li>
<li><tt class="docutils literal"><span class="pre">timeout</span></tt>: If no task is completed within <tt class="docutils literal"><span class="pre">timeout</span></tt> from the previous one, a <a class="reference" href="docs/papyros.TimeoutError-class.html">TimeoutError</a> is raised. By default there is no timeout.</li>
<li><tt class="docutils literal"><span class="pre">chunksize</span></tt>: Bundles tasks in chunks of the given size when sending/receiving from workers. The less time each task is expected to take, the larger <tt class="docutils literal"><span class="pre">chunksize</span></tt> should be, otherwise the cost of communication may surpass the cost of computation.</li>
</ul>
</div>
</div>
<div class="section" id="advanced-usage">
<h1><a class="toc-backref" href="#id13">Advanced usage</a></h1>
<p><tt class="docutils literal"><span class="pre">TaskDispatcher.execute</span></tt> should cover most common needs. The rest API provides a more fine-grained control of how tasks are sent and received between the client and the dispatcher as well as other less common features.</p>
<div class="section" id="task-queues">
<h2><a class="toc-backref" href="#id14">Task queues</a></h2>
<p>Clients do not interact directly with the dispatcher; instead they interact with <a class="reference" href="docs/papyros.base.TaskQueue-class.html">TaskQueue</a> objects created by (and associated with) a dispatcher by calling <a class="reference" href="docs/papyros.base.TaskDispatcher-class.html#make_queue">make_queue</a>. Each task queue acts as a separate channel between the client and the dispatcher. A single dispatcher can handle tasks from multiple task queues while ensuring that the executed tasks return to the correct destination.</p>
<p>Given a task queue, a client can <a class="reference" href="docs/papyros.base.TaskQueue-class.html#push">push</a> a new task and <a class="reference" href="docs/papyros.base.TaskQueue-class.html#pop">pop</a> a result of an executed task. Both <tt class="docutils literal"><span class="pre">push</span></tt> and <tt class="docutils literal"><span class="pre">pop</span></tt> take an optional <tt class="docutils literal"><span class="pre">timeout</span></tt>, raising <a class="reference" href="docs/papyros.TimeoutError-class.html">TimeoutError</a> if the call is not completed within the given timeout. A client may also ask for as many results as possible within a given timeout by calling <a class="reference" href="docs/papyros.base.TaskQueue-class.html#pop_many">pop_many</a>. Finally, a task queue should be <a class="reference" href="docs/papyros.base.TaskQueue-class.html#close">closed</a> after it is not needed any more so that the dispatcher can release the resources associated with it.</p>
</div>
<div class="section" id="exceptions">
<h2><a class="toc-backref" href="#id15">Exceptions</a></h2>
<p>An exception raised while a task is being executed remotely is reraised on the client when <tt class="docutils literal"><span class="pre">execute(tasks).next()</span></tt> or <tt class="docutils literal"><span class="pre">task_queue.pop()/task_queue.pop_many()</span></tt> is called. By default, neither the task that raised the exception nor the original (remote) traceback are saved. To preserve them, pass to <a class="reference" href="docs/papyros.task.Task-class.html#subclass">Task.subclass</a> <tt class="docutils literal"><span class="pre">task_attr=True</span></tt> and <tt class="docutils literal"><span class="pre">traceback_attr=True</span></tt>, respectively. A raised exception instance then has two extra attributes: <tt class="docutils literal"><span class="pre">.task</span></tt> references the task that raised the exception <a class="footnote-reference" href="#id6" id="id3">[3]</a> and <tt class="docutils literal"><span class="pre">.traceback</span></tt> is the string representation of the original traceback:</p>
<pre class="literal-block">
try:
    print &quot;Result: %s&quot; % task_queue.pop()
except Exception, ex:
    print &quot;Exception raised on task %s: %s&quot; % (ex.task, ex)
    print &quot;Original traceback follows\n%s&quot; % ex.traceback
</pre>
</div>
<div class="section" id="distributed-groups">
<h2><a class="toc-backref" href="#id16">Distributed groups</a></h2>
<p>Often there is no reason to have more than one distributed process group; different threads and processes may all share it by creating separate task queues, either implicitly (when calling <tt class="docutils literal"><span class="pre">execute</span></tt>) or explicitly. Some times though it may be desired -- for performance, robustness, security or other reasons -- to have more than one separate dispatchers and associated worker processes. <tt class="docutils literal"><span class="pre">papyros</span></tt> supports starting and connecting to multiple distributed groups, each of them identified by a unique user-specified name. To start the dispatcher and workers for a given group, provide the <tt class="docutils literal"><span class="pre">--group</span></tt> option:</p>
<pre class="literal-block">
start-papyros dispatcher --group=MyGroup [options]
start-papyros workers --group=MyGroup [options]
</pre>
<p>A client can then access a given dispatcher by specifying its group name:</p>
<pre class="literal-block">
dispatcher = papyros.PyroTaskDispatcher.get_proxy(group='MyGroup')
</pre>
</div>
<div class="section" id="size-bounds">
<h2><a class="toc-backref" href="#id17">Size bounds</a></h2>
<p>By default, dispatchers are unbounded, meaning that an arbitrarily large number of tasks can be pushed to them. Likewise, task queues are unbounded, meaning that an arbitrarily large number of results can be returned to them, waiting to be popped. Both dispatchers and task queues can be bounded in order to limit resource consumption by specifying <tt class="docutils literal"><span class="pre">size=N</span></tt> to <a class="reference" href="docs/papyros.base.TaskDispatcher-class.html#__init__">TaskDispatcher.__init__</a> and <a class="reference" href="docs/papyros.base.TaskDispatcher-class.html#make_queue">TaskDispatcher.make_queue</a>, respectively. Trying to push to a full dispatcher or task queue will either block until it becomes non-full or raise <tt class="docutils literal"><span class="pre">TimeoutError</span></tt> if a timeout is specified.</p>
<hr class="docutils" />
<table class="docutils footnote" frame="void" id="id4" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id1">[1]</a></td><td>In the case of remote execution, the module defining the tasks (as well as all the modules it imports, recursively), have to be importable from all hosts. In the future this restriction may be raised by enabling the PYRO_MOBILE_CODE feature.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id5" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id2">[2]</a></td><td>Actually the workers may be started before the dispatcher; in this case they will connect automatically once their dispatcher starts.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id6" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id3">[3]</a></td><td>Actually it references a copy of the task, not the original task instance, so this attribute should not be used for identity checks.</td></tr>
</tbody>
</table>
</div>
</div>
</div>
<div class="footer">
<hr class="footer" />
<a class="reference" href="README.txt">View document source</a>.
Generated on: 2008-08-11.
Generated by <a class="reference" href="http://docutils.sourceforge.net/">Docutils</a> from <a class="reference" href="http://docutils.sourceforge.net/rst.html">reStructuredText</a> source.

</div>
</body>
</html>
